How can I download an article?

To download an article from SID, first log in to the site, search for the article title, and click on the 'Download Article' option.

How can I download an ISI article?

To download an ISI article on SID, enter the keyword or article title in the search bar, view the relevant results, click on the desired article, and select the 'Download Article' option.

How can I access the SID database?

To access the SID database, visit SID.ir, create an account, and log in to access scientific resources.

Is downloading articles from SID free?

Some articles on SID are available for free, while others require payment. Details are specified on the article's page.

مرکز اطلاعات علمی Scientific Information Database (SID) - Trusted Source for Research and Academic Resources

Journal Article

Download

فارسی Version

Title:

Comparison of Data Preprocessing methods for gene expression Data of Affymetrix Microarray

Author(s):

ATASHI Hadi

Journal:

Breeding and Improvement of Livestock Journal

Issue Info:

Year:
2023
Volume:
3
Issue:
1
Pages:
5-16

Keywords:

Abstract:

Microarray technology is a powerful technique to measure the expression levels of large numbers of genes simultaneously. Microarray Data contains many noise sources; therefore, several Preprocessing steps are necessary to convert the raw Data to achieve accurate analyzing results. Preprocessing of microarray Data includes background correction, Data normalization, and summarization steps each can be performed by a large variety of methods. However, the relative impact of these methods on the detection of differentially expressed genes remains to be determined. The aim of this study was to compare the effects of different methods of Preprocessing on the results of differentially expressed gene detection. The used Data was downloaded from the NCBI GEO Database. The series (GSE) accession number, platform (GPL) accession number, and platform name of the Data were GSE56589, GPL18534, and Affymetrix Bovine Genome Array, respectively. Two background correction methods (MAS.5 and RMA.2), two normalization methods (Scaling normalization and Quantile normalization), and two summarization methods (Tukey biweight and Medianpolish) were evaluated. The results showed that the number and types of differentially expressed genes could be mainly affected by background correction and normalization methods, but the summarization method showed a small impact.

Yearly Impact: مرکز اطلاعات علمی Scientific Information Database (SID) - Trusted Source for Research and Academic Resources

Download 10 مرکز اطلاعات علمی Scientific Information Database (SID) - Trusted Source for Research and Academic Resources

Citation 0 مرکز اطلاعات علمی Scientific Information Database (SID) - Trusted Source for Research and Academic Resources

Refrence 0

Journal Article

Download

فارسی Version

Title:

Explainable Diabetes Prediction via Hybrid Data Preprocessing and Ensemble Learning

Author(s):

Kakavand Teimoory Ghazaleh | Keyvanpour Mohammad Reza | Ghaebi Maryam

Journal:

International Journal of Web Research

Issue Info:

Year:
2025
Volume:
8
Issue:
4
Pages:
51-66

Keywords:

Diabetes Prediction Q4

Abstract:

Accurate and early prediction of diabetes is crucial for initiating prompt treatment and minimizing the risk of long-term health issues. This study introduces a comprehensive machine learning model aimed at improving diabetes prediction by leveraging two clinical Datasets: the PIMA Indians Diabetes Dataset and the Early-Stage Diabetes Dataset. The pipeline tackles common challenges in medical Data, such as missing values, class imbalance, and feature relevance, through a series of advanced Preprocessing steps, including class-specific imputation, engineered feature construction, and SMOTETomek resampling. To identify the most informative predictors, a hybrid feature selection strategy is employed, integrating recursive elimination, Random Forest-based importance, and gradient boosting. Model training uses Random Forest and Gradient Boosting classifiers, which are fine-tuned and combined through weighted ensemble averaging to boost predictive performance. The resulting model achieves 93.33% accuracy on the PIMA Dataset and 98.44% accuracy on the Early-Stage Dataset, outperforming previously reported approaches. To enhance transparency and clinical applicability, both local (LIME) and global (SHAP) explainability methods are applied, highlighting clinically relevant features. Furthermore, probability calibration is performed to ensure that predicted risk scores align with true outcome frequencies, increasing trust in the model’s use for clinical decision support. Overall, the proposed model offers a robust, interpretable, and clinically reliable solution for early-stage diabetes prediction.

Yearly Impact: مرکز اطلاعات علمی Scientific Information Database (SID) - Trusted Source for Research and Academic Resources

Download 0 مرکز اطلاعات علمی Scientific Information Database (SID) - Trusted Source for Research and Academic Resources

Citation 0 مرکز اطلاعات علمی Scientific Information Database (SID) - Trusted Source for Research and Academic Resources

Refrence 0

Journal Article

Download

فارسی Version

Title:

Data Preprocessing for heart disease classification: A systematic literature review

Author(s):

Journal:

Comput Methods Programs Biomed

Issue Info:

Year:
2020
Volume:
195
Issue:
-
Pages:
0-0

Keywords:

Abstract:

Yearly Impact: مرکز اطلاعات علمی Scientific Information Database (SID) - Trusted Source for Research and Academic Resources

Download 0 مرکز اطلاعات علمی Scientific Information Database (SID) - Trusted Source for Research and Academic Resources

Citation 1 مرکز اطلاعات علمی Scientific Information Database (SID) - Trusted Source for Research and Academic Resources

Refrence 0

Journal Article

Download

فارسی Version

Title:

Enhancing Learning from Imbalanced Classes via Data Preprocessing: A Data-Driven Application in Metabolomics Data Mining

Author(s):

BaniMustafa Ahmed

Journal:

THE ISC INTERNATIONAL JOURNAL OF INFORMATION SECURITY

Issue Info:

Year:
2019
Volume:
11
Issue:
3
Pages:
79-89

Keywords:

Abstract:

This paper presents a Data mining application in metabolomics. It aims at building an enhanced machine learning classifier that can be used for diagnosing cachexia syndrome and identifying its involved biomarkers. To achieve this goal, a Data-driven analysis is carried out using a public Dataset consisting of 1H-NMR metabolite profile. This Dataset suffers from the problem of imbalanced classes which is known to deteriorate the performance of classifiers. It also influences its validity and generalizablity. The classification models in this study were built using five machine learning algorithms known as PLS-DA, MLP, SVM, C4. 5 and ID3. This model is built after carrying out a number of intensive Data Preprocessing procedures to tackle the problem of imbalanced classes and improve the performance of the constructed classifiers. These procedures involves applying Data transformation, normalization, standardization, re-sampling and Data reduction procedures using a number of variables importance scorers. The best performance was achieved by building an MLP model that was trained and tested using five-fold cross-validation using Datasets that were re-sampled using SMOTE method and then reduced using SVM variable importance scorer. This model was successful in classifying samples with excellent accuracy and also in identifying the potential disease biomarkers. The results confirm the validity of metabolomics Data mining for diagnosis of cachexia. It also emphasizes the importance of Data Preprocessing procedures such as sampling and Data reduction for improving Data mining results, particularly when Data suffers from the problem of imbalanced classes.

Yearly Impact: مرکز اطلاعات علمی Scientific Information Database (SID) - Trusted Source for Research and Academic Resources

Download 155 مرکز اطلاعات علمی Scientific Information Database (SID) - Trusted Source for Research and Academic Resources

Citation 0 مرکز اطلاعات علمی Scientific Information Database (SID) - Trusted Source for Research and Academic Resources

Refrence 0

Journal Article

Download

فارسی Version

Title:

Data Preprocessing FOR RIVER FLOW FORECASTING USING NEURAL NETWORKS: WAVELET TRANSFORMS AND Data PARTITIONING

Author(s):

CANNAS B.

Journal:

PHYSICS AND CHEMISTRY OF THE EARTH

Issue Info:

Year:
2006
Volume:
31
Issue:
18
Pages:
1164-1171

Keywords:

Abstract:

Yearly Impact: مرکز اطلاعات علمی Scientific Information Database (SID) - Trusted Source for Research and Academic Resources

Download 0 مرکز اطلاعات علمی Scientific Information Database (SID) - Trusted Source for Research and Academic Resources

Citation 1 مرکز اطلاعات علمی Scientific Information Database (SID) - Trusted Source for Research and Academic Resources

Refrence 0

Journal Article

Download

فارسی Version

Title:

A Preprocessing STAGE BEFORE THE FEATURE EXTRACTION PROCEDURE IN CLASSIFICATION OF HYPERSPECTRAL IMAGES Data

Author(s):

SALEHI BAHRAM | VALADAN ZOUJ M.J. | SERAJIAN M.R.

Journal:

Issue Info:

Year:
2008
Volume:
42
Issue:
3 (113)
Pages:
327-338

Keywords:

CLASSIFICATION Q2

DIMENSIONALITY REDUCTION Q3

FEATURE EXTRACTION Q2

HYPERSPECTRAL PROJECTION PURSUIT LOWPASS FILTERING

Abstract:

Hyperspectral Data potentially contain more information than multispectral Data because of their higher spectral resolution. However, the stochastic Data analysis approaches that have been successfully applied to multispectral Data are not as effective for hyperspectral Data as well. Various investigations indicate that the key problem that causes poor performance in the stochastic approaches to hyperspectral Data classification is inaccurate class parameters estimation. It has been found that the conventional approaches can be retained if a Preprocessing stage is established before feature extraction procedure in classification of hyperspectral Data. For Preprocessing stage it has been proposed two steps in this paper including dimensionality reduction and class separability improvement. Sequential Parametric Projection Pursuit was used for dimensionality reduction because of its special characteristics. Projection Pursuit algorithm performs the computation of class parameter estimation at a lower dimensional space, giving better parameter estimation. For class separability improvement a lowpass filter has been used after dimensionality reduction. This paper shows that for different number of features, classification accuracy is improved when the Preprocessing stage is applied.

Yearly Impact: مرکز اطلاعات علمی Scientific Information Database (SID) - Trusted Source for Research and Academic Resources

Download 0 مرکز اطلاعات علمی Scientific Information Database (SID) - Trusted Source for Research and Academic Resources

Citation 0 مرکز اطلاعات علمی Scientific Information Database (SID) - Trusted Source for Research and Academic Resources

Refrence 0

Journal Article

Download

فارسی Version

Title:

PERFORMANCE ASSESSMENT OF GENE EXPRESSION PROGRAMMING MODEL USING Data Preprocessing METHODS TO MODELING RIVER FLOW

Author(s):

SOLGI A. | ZAREI H. | GOLABI M.R.

Journal:

Journal of Water and Soil Conservation

Issue Info:

Year:
2017
Volume:
24
Issue:
2
Pages:
185-201

Keywords:

Data PRE-PROCESSING Q1

FLOW MODELING Q1

GENE EXPRESSION PROGRAMMING Q1

WAVELET TRANSFORM

PCA METHOD

Abstract:

Background and Objectives: An increasing need to water causes the importance of planning management in order to control water consumption in the future. River flow prediction, in addition to the management of water resources, can predict natural disasters such as flood and drought. Therefore, an accurate estimation of river flow using different models is an issue which has been considered by different water resource researchers. Intelligent models have been used to predict river flow. One of these models, which have shown appropriate performance, is Gene Expression Programming (GEP). A use of intelligent models in combinations has been lately accepted and for this purpose, the wavelet transform is usually used.Materials and Methods: In this study, the GEP model was used for modeling flow in the daily and monthly scale in Gamasiyab River. For this purpose, Data of precipitation, temperature, evaporation and flow Gamasiyab River in Varayeneh Station was used during the period from 1970 to 2012. To increase the accuracy of the model, two methods of Data pre-process, called Wavelet transform and principal components analysis (PCA) and were used in such a way that the primary signal of each input parameter was decomposed using the wavelet transform.Then, to determine main sub-signals, the principal components analysis was used and main sub-signals as inputs were entered into the GEP model to produce Wavelet-Gene Expression Programming (WGEP).Results: Detection of different structures of the GEP model showed that the performance of the model was good on the daily scale, but in the monthly scale, the performance was reduced. The comparison of the WGEP model with The GEP model showed that the performance of the hybrid model in both of the daily and monthly scale was better than the simple model. It’s because of a pre-process which was done on Data. The results of the hybrid model, based on the coefficient determination, was increased by 4% on the daily scale and by 23% in the monthly scale. Also, regarding too many sub-signals, using the Principal Components Analysis increased the speed of running.Conclusion: Using pre-process of Data has increased the performance of the model and using the PCA, as an auxiliary tool for the wavelet transform, increased the speed and accuracy of the model. Totally, the results showed that it’s possible to use the GEP model with the wavelet transform as a suitable tool for modeling and predicting the flow of Gamasiyab River.

Yearly Impact: مرکز اطلاعات علمی Scientific Information Database (SID) - Trusted Source for Research and Academic Resources

Download 0 مرکز اطلاعات علمی Scientific Information Database (SID) - Trusted Source for Research and Academic Resources

Citation 0 مرکز اطلاعات علمی Scientific Information Database (SID) - Trusted Source for Research and Academic Resources

Refrence 0

Seminar Article

Download

Title:

AN EFFICIENT CHEMOMETRIC STRATEGY BASED ON WAVELET TRANSFORM FOR Preprocessing OF MULTI-CAPILLARY COLUMN – ELECTRONIC NOSE Data

Writer:

MASOUM SAEED | ALINOORI AMIR HOSSEIN

Conference:

IRANIAN BIENNIAL CHEMOMETRICS SEMINAR

Issue Info:

Year:
2015
Volume:
5

Keywords:

MULTI CAPILLARY COLUMN CHROMATOGRAPHY

ELECTRONIC NOSE

WAVELET TRANSFORM

SIGNAL DENOISING

Abstract:

THE MULTI-CAPILLARY COLUMN (MCC) THAT IS HYPHENATED WITH ELECTRONIC NOSE (E-NOSE) IS A NOVEL SENSITIVE AND STRONG TECHNIQUE WITH ABILITY TO SEPARATE COMPONENTS OF ODORS TO FACILITATE DETECTION OF THEM AND IT CAN BE USED IN MANY APPLICATIONS. ...

Yearly Impact: مرکز اطلاعات علمی Scientific Information Database (SID) - Trusted Source for Research and Academic Resources

Download 85

Journal Article

Download

فارسی Version

Title:

Assessment Effects of Data Preprocessing and Modeling Parameters of Gene Expression Programming on Accuracy of Time Series Forecasting

Author(s):

SALEHI M. | FATEMI S.E.

Journal:

Iranian Journal of Irrigation and Drainage

Issue Info:

Year:
2021
Volume:
15
Issue:
3
Pages:
582-597

Keywords:

Abstract:

Hydrological time-series is a time-dependent hydrological variable that finding the model of changes and predicting is the most important goal of time-series analysis. The purpose of this study is to simultaneously study the characteristics of time series and their prediction and the important parameters of the GEP for high-precision predictions in the training and validation. In this study, groundwater depth time-series of Chamchamal plain station located in Kermanshah province with a 12-year period and mountainous climate and the monthly time-series of Alaska temperature with a 50-year period and cold and dry climate have been used. Genexprotools5. 0 software has been used to model time-series by GEP. The results of studying with GEP showed that the periodicity of Data properties that existed in the time series of temperature caused correlation results above 90% in different stages of training and validation. So that the effect of different parameters of GEP is less than 10% in improving results. On the other hand, by examining the time-series of groundwater depth, which lacks periodicity and has a descending ACF shape, the prediction results of the GEP with any effective expression parameter, R more than 44% in the validation wasn't obtained. This means that the time-series Preprocessing has a greater impact on the prediction results. So that by eliminating the semester, the prediction results in all stages of modeling are significantly reduced. In this case, the best R for the validation is 50%.

Yearly Impact: مرکز اطلاعات علمی Scientific Information Database (SID) - Trusted Source for Research and Academic Resources

Download 0 مرکز اطلاعات علمی Scientific Information Database (SID) - Trusted Source for Research and Academic Resources

Citation 0 مرکز اطلاعات علمی Scientific Information Database (SID) - Trusted Source for Research and Academic Resources

Refrence 0

Journal Article

Download

فارسی Version

Title:

IMPROVING THE PERFORMANCE OF ICA ALGORITHM FOR FMRI SIMULATED Data ANALYSIS USING TEMPORAL AND SPATIAL FILTERS IN THE Preprocessing PHASE

Author(s):

BOROOMAND A. | AHMADIAN A. | OGHABIAN M.A.

Journal:

Iranian Journal of Medical Physics

Issue Info:

Year:
2007
Volume:
4
Issue:
14-15
Pages:
1-18

Keywords:

INDEPENDENT COMPONENT ANALYSIS Q3

FUNCTIONAL MRI Q3

TEMPORAL FILTER

SPATIAL FILTER

Abstract:

Introduction: The accuracy of analyzing Functional MRI (fMRI) Data is usually decreases in the presence of noise and artifact sources. A common solution in for analyzing fMRI Data having high noise is to use suitable Preprocessing methods with the aim of Data denoising. Some effects of Preprocessing methods on the parametric methods such as general linear model (GLM) have previously been evaluated methods such as. In this study, besides the comparison of simple and noisy Independent Component Analysis (ICA) algorithms, the quantity effects of some spatial and temporal filtering have been evaluated on the functionality of ICA algorithms. Noisy ICA algorithms perform with a higher accuracy (up to 16%) in comparison to simple ICA for noisy fMRI Data, although it is more time consuming than simple ICA. The accuracy of the results is improved by 8-10% using spatial and temporal filtering prior to simple ICA.Materials and Methods: Simple ICA and noisy ICA methods have been compared for analyzing simulated fMRI Data sets. The impact of some temporal and spatial filters on the functionality of simple ICA algorithms has been evaluated. Implemented filters have been proposed in low and high pass group.Results: The sensitivity, specificity and temporal accuracy of simple ICA algorithms has been improved by using high pass filters. Although low pass filtering has some positive effects on the performance of simple ICA algorithms in the low SNR levels, in the high signal-noise Ratio (SNR) levels these low pass filters may cause a decrease in the sensitivity, specificity and temporal accuracy of simple ICA methods.Discussion and Conclusion: The results obtained from simple and noisy ICA algorithms for analyzing fMRI Data having high SNR levels are approximately similar. Infomax algorithm uses Gradient based methods for estimating unmixing matrix has better sensitivity, specificity and temporal accuracy than Fast ICA for analyzing noisy ICA Data. An alternative to the complicated and time consuming noisy ICA algorithms is to preprocess and denoise fMRI Data prior to analyzing it by simple ICA algorithms.

Yearly Impact: مرکز اطلاعات علمی Scientific Information Database (SID) - Trusted Source for Research and Academic Resources

Download 0 مرکز اطلاعات علمی Scientific Information Database (SID) - Trusted Source for Research and Academic Resources

Citation 0 مرکز اطلاعات علمی Scientific Information Database (SID) - Trusted Source for Research and Academic Resources

Refrence 0

ابتدا 1 2 3 4 5 6 7 8 9 10 انتها ›

بعدی

Scientific Information Database

ISSN: 2588-4824

Search Result

Relevance

Newest

Most Viewed

Most Downloaded

Most Cited